REMARKS 

Claims 1-9 have been cancelled without prejudice. Claims 10-32 have been 
added. It is respectfully submitted that the new claims are fully disclosed and supported 
in the original specification and that no new matter has been added. The Applicants 
respectfully request reconsideration of this application in view of the above amendments 
and the following remarks. 

Specification 

Applicants submit herein proposed amendments to correct informalities. These 
amendments add no new matter. 

Conclusion 

For all the reasons cited above, Applicant respectfully submits that the claims are 
in condition for allowance. The Examiner is requested to call Brent E. Vecchia at (303) 
740-1980 if there remains any issue with allowance of the case. 
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Request for an Extension of Time 



The Applicant respectfully petitions for an extension of time to respond to the 
outstanding Office Action pursuant to 37 C.F.R. § 1.136(a) should one be necessary. 
Please charge our Deposit Account No. 09-0457 to cover the necessary fee under 37 
C.F.R. § 1.17 for such an extension. 

Charge our Deposit Account 

Please charge any shortage to our Deposit Account No. 02-2666. 



Respectfully submitted, 



BLAKELY, SOKOLOFF, TAYLOR & ZAFMAN LLP 



Date: July 17, 2001 




Brem E. Vecchia 
Reg. No. 48,011 



12400 Wilshire Boulevard 
Seventh Floor 

Los Angeles, California 90025-1026 
(303) 740-1980 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 



In The Abstract: 

The abstract beginning at page 40, line 2 has been amended as follows: 



— A method and apparatus [are provided] for executing [scalar] partial-width 
packed data instructions are discussed . The processor may include [According to 
one aspect of the invention, a processor includes] a plurality of registers, a register 
renaming unit [coupled to the plurality of registers], a decoder [coupled to the 
register renaming unit], and a partial- width execution unit [coupled to the 
decoder]. The register renaming unit provides an architectural register file to 
store packed data operands each of which include a plurality of data elements. 
The decoder is [configured] to decode a first and second set of instructions that 
each specify one or more registers in the architectural register file. [Each of the 
instructions in the] The first set of instructions specify operations to be performed 
on all of the data elements stored in the one or more specified registers. In 
contrast, [each of the instructions in] the second set of instructions specify 
operations to be performed on only a subset of the data elements [element stored 
in the one or more specified registers]. The partial-width execution unit is 
[configured] to execute operations specified by either [of] the first or the second 
set of instructions. ~ 



Docket No. 42390p5193C 
Application No.: 09/852,217 



16 



In The Specification: 

The paragraph beginning at page 9, line 9 has been amended as follows: 

- Another situation in which it is undesirable to force a packed data 
instruction to return individual results for each pair of data elements includes 
arithmetic operations in an environment providing partial-width hardware. Due to 
cost and/or die limitations, it is common not to provide full support for certain 
arithmetic operations, such as divide. [The] By its nature, the divide operation is 
very long, even when full-width hardware (e.g., a one-to-one correspondence 
between execution units and data elements) is [implemented,] implemented . 
Therefore, in an environment that supports only full-width packed data operations 
while providing partial- width hardware, the latency becomes even longer. As will 
be illustrated further below, a partial-width packed data operation, such as a 
partial-width packed data divide operation, may selectively allow certain portions 
of its operands to bypass the divide hardware. In this manner, no performance 
penalty is incurred by operating upon only a subset of the data elements in the 
packed data operands. ~ 

- Additionally, exceptions raised in connection with extraneous data elements 
may cause confusion to the developer and/or incompatibility between SISD and 
SIMD machines. Therefore, it is advantageous to report exceptions for only those 
data elements upon which the instruction is meant to operate. Partial-width 
packed data instruction support allows a predictable exception model to be 
achieved by limiting the triggering of exceptional conditions to those raised in 
connection with the data elements being operated uponf.J^or in which exceptions 
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produced by extraneous data elements would be likely to cause confusion or 
incompatibility between SISD and SMD machines. — 

As described above, packed data operations may be implemented as two half 
width micro instructions (e.g., a high order operation and a low order operation). 
Rather than independently decoding the macro instruction by two decoders to 
produce the high and low order operations as would be typically required by prior 
processor implementations, as a feature of the present embodiment both micro 
instructions may be generated by the same decoder. In this example, this is 
accomplished by replication logic 1150 which replicates either the high or low 
order operation and subsequently modifies the resulting replicated operation 
appropriately to create the remaining operation. Importantly, as was described 
earlier, by carefully encoding the register address space, the registers referenced 
by the micro instructions (e.g., the logical source and destination registers) can be 
made to differ by a single bit. As a result, the modification logic 1 160 in its most 
simple form may comprise one or more inverters to invert the appropriate bits to 
produce a high order operation from a low order operation and vice versa. In any 
event, the replicated micro instruction is then passed to multiplexer 1170. The 
multiplexer 1 170 also receives a micro instruction produced by decoder 1 120. In 
this example, the multiplexer 1170, under the control of a validity decoder 1180, 
outputs the replicated micro instruction for packed data operations (including 
partial-width packed data operations) and outputs the micro instruction received 
from decoder 1 120 for operations other than packed data operations. Therefore, it 
is advantageous to optimize the opcode map to simplify the detection of packed 
data operations by the replication logic 1 150. For example, if only a small portion 
of the [of the] macro instruction needs to be examined to distinguish packed data 
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operations from other operations, then less circuitry may be employed by the 
validity decoder 1 180. - 



-In an implementation that passes through source data elements to the logical 
destination register for purposes of executing partial-width packed data 
operations, in addition to selection logic similar to that described with respect to 
Figures 5A and 5C, logic may be included to eliminate ("kill") one of the high or 
low order operations. Preferably, for performance reasons, the extraneous micro 
instruction is eliminated early in the pipeline. This elimination may be 
accomplished according to the embodiment depicted by using a micro instruction 
selection signal output from micro instruction length determination circuitry 1190. 
The micro instruction length determination logic 1190 examines a portion of the 
macro instruction and produces the micro instruction selection signal which 
indicates a particular combination of one or more micro instructions that are to 
proceed down the pipeline. In the case of a scalar SMD instruction, only one of 
the resulting high and low order operations will be allowed to proceed. For 
example, the micro instruction selection signal may be represented as a bit mask 
that identifies those of the micro instructions that are to be retained and those that 
are to be eliminated. Alternatively, the micro instruction selection signal may 
simply indicate the number of micro instructions from a predetermined starting 
point that are to be eliminated or retained. Logic required to perform the 
elimination described above will vary depending upon the steering mechanism 
that guides the micro instructions through the remainder of the pipeline. For 
instance, if the micro instructions are queued, logic [would] may be added to 
manipulate the head and tail pointers of the micro instruction queue to cause 
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invalid micro instructions to be overwritten by subsequently generated valid 
micro instructions. Numerous other elimination techniques will be apparent to 
those of ordinary skill in the art. 

In The Claims: 

Please cancel claims 1-9. 

Please add Claims 10-32. 
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